Parallel Machine Learning Using Concurrency Control
نویسنده
چکیده
Parallel Machine Learning Using Concurrency Control by Xinghao Pan Doctor of Philosophy in Computer Science and the Designated Emphasis in Communications, Computation and Statistics University of California, Berkeley Professor Michael I. Jordan, Chair Many machine learning algorithms iteratively process datapoints and transform global model parameters. It has become increasingly impractical to serially execute such iterative algorithms as processor speeds fail to catch up to the growth in dataset sizes. To address these problems, the machine learning community has turned to two parallelization strategies: bulk synchronous parallel (BSP), and coordination-free. BSP algorithms partition computational work among workers, with occasional synchronization at global barriers, but has only been applied to ‘embarrassingly parallel’ problems where work is trivially factorizable. Coordination-free algorithms simply allow concurrent processors to execute in parallel, interleaving transformations and possibly introducing inconsistencies. Theoretical analysis is then required to prove that the coordination-free algorithm produces a reasonable approximation to the desired outcome, under assumptions on the problem and system. In this dissertation, we propose and explore a third approach by applying concurrency control to manage parallel transformations in machine learning algorithms. We identify points of possible interference between parallel iterations by examining the semantics of the serial algorithm. Coordination is then introduced to either avoid or resolve such conflicts, whereas non-conflicting transformations are allowed to execute concurrently. Our parallel algorithms are thus engineered to produce the same exact output as the serial machine learning algorithm, preserving the serial algorithm’s theoretical guarantees of correctness while maximizing concurrency. We demonstrate the feasibility of our approach to parallelizing a variety of machine learning algorithms, including nonparametric unsupervised learning, graph clustering, discrete optimization, and sparse convex optimization. We theoretically prove and empirically verify that our parallel algorithms produce equivalent output to their serial counterparts. We also theoretically analyze the expected concurrency of our parallel algorithms, and empirically demonstrate their scalability.
منابع مشابه
Forward kinematic analysis of planar parallel robots using a neural network-based approach optimized by machine learning
The forward kinematic problem of parallel robots is always considered as a challenge in the field of parallel robots due to the obtained nonlinear system of equations. In this paper, the forward kinematic problem of planar parallel robots in their workspace is investigated using a neural network based approach. In order to increase the accuracy of this method, the workspace of the parallel robo...
متن کاملOptimistic Concurrency Control for Distributed Unsupervised Learning
Research on distributed machine learning algorithms has focused primarily on one of two extremes—algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this “opt...
متن کاملParallel Double Greedy Submodular Maximization
Many machine learning problems can be reduced to the maximization of submodular functions. Although well understood in the serial setting, the parallel maximization of submodular functions remains an open area of research with recent results [1] only addressing monotone functions. The optimal algorithm for maximizing the more general class of non-monotone submodular functions was introduced by ...
متن کاملOn Mining Concurrency Defect-Related Reports from Bug Repositories
We present early findings of two ongoing case studies in which we automatically extract reports about concurrency defects from the MySQL and Apache bug repositories. To mine the unstructured reports, we apply keyword search and machine learning, using linear and non-linear classifiers. We analyze the results in detail and suggest some improvements for this mining task. Automated Bug Report Clas...
متن کاملTwo-stage fuzzy-stochastic programming for parallel machine scheduling problem with machine deterioration and operator learning effect
This paper deals with the determination of machine numbers and production schedules in manufacturing environments. In this line, a two-stage fuzzy stochastic programming model is discussed with fuzzy processing times where both deterioration and learning effects are evaluated simultaneously. The first stage focuses on the type and number of machines in order to minimize the total costs associat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017